-
Notifications
You must be signed in to change notification settings - Fork 45
Use NTP admin API instead of sled agent, remove sled agent time_sync API #8597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c4faf10
to
8f65850
Compare
FYI, I ran this on a4x2 - it didn't work, until I fixed an SMF typo, now it works. This is tested implicitly by the "helios / deploy" job, which relies on RSS, and also relies on the timesync API in the NTP admin server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
@@ -1464,29 +1464,6 @@ | |||
} | |||
} | |||
}, | |||
"/timesync": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's any documentation that relies on checking the response from this endpoint in the logs to debug an NTP sync issue or something similar. I have a vague memory of something like that. If anyone remembers, that documentation should be updated. Perhaps an announcement on Matrix might be good as well? I've certainly used it before if an omicron deployment was taking too long to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
FYI, I ran this on a4x2 - it didn't work, until I fixed an SMF typo, now it works. This is tested implicitly by the "helios / deploy" job, which relies on RSS, and also relies on the timesync API in the NTP admin server.
I don't see an SMF change in this PR - was it something a4x2-specific outside omicron?
let ntp_addresses: Vec<_> = service_plan | ||
.services | ||
.iter() | ||
.flat_map(|(_, sled_config)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda tempted to suggest we do something here to confirm we have exactly one NTP zone per sled (probably via asserting, since we came up with the plan and it should always be correct?). Maybe that would be overkill, since 0 or more-than-one NTP zone on a sled is going to cause problems pretty quickly anyway? Something like (using exactly_one()
from itertools):
let ntp_addresses: Vec<_> = service_plan
.services
.iter()
.map(|(_, sled_config)| {
sled_config.zones.iter().filter_map(|zone_config| {
// ...extract ntp_admin_addr...
})
.exactly_one()
.expect("plan should specify one NTP zone per sled")
})
.collect();
BlueprintZoneType::BoundaryNtp( | ||
blueprint_zone_type::BoundaryNtp { | ||
address, .. | ||
}, | ||
) => { | ||
let mut ntp_admin_addr = *address; | ||
ntp_admin_addr.set_port(NTP_ADMIN_PORT); | ||
Some(ntp_admin_addr) | ||
} | ||
BlueprintZoneType::InternalNtp( | ||
blueprint_zone_type::InternalNtp { address }, | ||
) => { | ||
let mut ntp_admin_addr = *address; | ||
ntp_admin_addr.set_port(NTP_ADMIN_PORT); | ||
Some(ntp_admin_addr) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both of these arms bind a field named address
with the same type, I think you can combine them?
BlueprintZoneType::BoundaryNtp(
blueprint_zone_type::BoundaryNtp {
address, ..
},
)
| BlueprintZoneType::InternalNtp(
blueprint_zone_type::InternalNtp { address },
) => {
let mut ntp_admin_addr = *address;
ntp_admin_addr.set_port(NTP_ADMIN_PORT);
Some(ntp_admin_addr)
}
@@ -724,17 +728,20 @@ impl ServiceInner { | |||
|
|||
async fn wait_for_timesync( | |||
&self, | |||
sled_addresses: &Vec<SocketAddrV6>, | |||
ntp_admin_addresses: &Vec<SocketAddrV6>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is consistent with what was already here, but: by accepting a list of socket addrs, we end up rebuilding an NtpAdminClient
for every sled inside every attempt of the retry_notify
loop. Maybe we should take a &[NtpAdminClient]
instead of a list of socket addrs, so we only create the clients once? (I have a vague memory of reqwest clients being relatively expensive to construct, but maybe we solved that some other way?)
|
||
let stdout = running_ntp_zone | ||
.run_cmd(&["/usr/bin/chronyc", "-c", "tracking"]) | ||
.map_err(TimeSyncError::ExecuteChronyc)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Love getting rid of this.
I pulled the SMF change into #8555 ( |
Builds on #8555
This PR:
Fixes #8590